R version 3.6.0 (2019-04-26) and the R-packages tidyverse [Version 1.2.1], rlang [Version 0.4.0], here [Version 0.1], brms [Version 2.9.0], tidybayes [Version 1.1.0], bayestestR [Version 0.2.2], modelr [Version 0.1.4], ggforce [Version 0.2.2], ggrepel [Version 0.8.1], and ggridges [Version 0.5.1] were used for data preparation, analysis, and presentation.

Models

Bayesian distributional models were fitted using the brms R-package. These models account for the fact that nLEDs are bounded between 0 and 1, with inflated counts at these bounds, which results in a non-normal distributions. Crucially, in contrast to models fitted under the assumption of normality, these models do not make predictions outside the possible range of values. At the time of writing, distributional models of this nature are only available for hierarchical data using the brms R-package, which requires model fitting to be performed using a Bayesian framework. Additionally, Baeysian models do not suffer from the non-convergence associated with performing complex analyses under a Frequentist framework.

Models were fitted to the data aggregated by subjects. This aggregation was performed to account for the restricted range of length-normalised Levenshtein edit distances (nLEDs). When attempting the fit models at the individual trial level, these models were unidentifiable, with poor posterior predictive checks and unacceptably high \(\hat{R}\)s for several terms.

Zero-one Inflated Beta Distributions

The models were fitted using a zero-one inflated Beta distribution, which models the data as a Beta distribution for nLEDs excluding 0 and 1, and a Bernoulli distribution for binary nLEDs of 0 and 1. Thus, predictors in the model can affect four distributional parameters: \(\mu\) (mu), the mean of the nLEDs excluding 0 and 1; \(\phi\) (phi), the precision of the nLEDs excluding 0 and 1; \(\alpha\) (alpha; termed zoi - or zero-one inflation in brms) the probability of an nLED of 0 or 1; and \(\gamma\) (gamma; termed coi - or conditional-one inflation in brms), the conditional probability of a 1 given a 0 or 1 has been observed. Larger values for these parameters are associated with (a) higher mean nLEDs in the range exluding 0 and 1, (b) tighter distributions of the nLEDs in the range excluding 0 and 1 (i.e. less variance), (c) more zero-one inflation in nLEDs, and (d) more one-inflation relative to zero-inflation in nLEDs. Predictors in this model can influence any and all distributional parameters in the model at once. For these models, a logit link is used for the \(\mu\), \(\alpha\), and \(\gamma\) distributional parameters, and a log link is used for the \(\phi\) distributional parameter.

Model Fitting

Model Specification

Three models were fitted in total: (1) assessing performance across conditions during the vocabulary test prior to literacy training; (2) assessing performance across conditions during the Testing phase following literacy training; and (3) assessing performance across conditions during the Testing phase following literacy training using the vocabulary test performance as a predictor. This latter model was not pre-registered, but instead serves an exploratory purpose. The models were described as follows:

  • Vocabulary Test Model: nLEDs are predicted by population-level (fixed) effects of Variety Exposure condition and Word Type, and by group-level (random) effects of random intercepts by participant.
  • Testing Model: nLEDs are predicted by population-level (fixed) effects of Task, Variety Exposure condition, and Word Type, and by group-level (random) effects of random intercepts by participant.
  • Exploratory Covariate Testing Model: nLEDs are predicted by population-level (fixed) effects of mean nLED during the Vocabulary Test, Task, Variety Exposure condition, and Word Type, and by group-level (random) effects of random intercepts by participant.

Crucially, in all models, the group-level effects (i.e. random intercepts by participant) are correlated across all distributional terms.

Model Priors

Note, by default, brms restricts priors on the SD to be positive.

Here, priors are described first by their expected distribution, and the parameters that define that distribution. For example, a prior of \(\mathcal{N}(0, 1)\) describes a normal distribution with a mean of 0 and a standard deviation of 1. Similarly, a prior of \(\mathcal{logistic}(0, 1)\) describes a logistic distribution with a mean of 0 and a standard deviation of 1.

The following priors were used for the exposure model:

  • Intercept
    • \(\mu\): \(\mathcal{N}(0.5, 1)\)
    • \(\phi\): \(\mathcal{N}(3, 1)\)
    • \(\alpha\): \(\mathcal{logistic}(0, 1)\)
    • \(\gamma\): \(\mathcal{logistic}(0, 1)\)
  • Slope
    • \(\mu\): \(\mathcal{N}(0, 0.2)\)
    • \(\phi\): \(\mathcal{N}(0, 0.5)\)
    • \(\alpha\): \(\mathcal{N}(0, 4)\)
    • \(\gamma\): \(\mathcal{N}(0, 0.2)\)
  • SD
    • \(\mu\): \(\mathcal{N}(0, 1)\)
    • \(\phi\): \(\mathcal{N}(0, 1)\)
    • \(\alpha\): \(\mathcal{N}(0, 4)\)
    • \(\gamma\): \(\mathcal{N}(0, 5)\)
  • SD by Participant Number
    • \(\mu\): \(\mathcal{N}(0, 1)\)
    • \(\phi\): \(\mathcal{N}(0, 3)\)
    • \(\alpha\): \(\mathcal{N}(0, 10)\)
    • \(\gamma\): \(\mathcal{N}(0, 10)\)
  • Correlation
    • \(LKJ(2)\)

Informative priors were used on the \(\mu\) and \(\phi\) intercept terms. Here, the \(\mu\) term assumes a mean intercept centred on 0.5 with a standard deviation of 1, thus placing a relatively large prior probability on the entire range of nLEDs. The \(\phi\) term further assumes that values closer to the mean are more likely than those further away from the mean. Together, these predict that nLEDs in the Beta distribution are likely to be centred on 0.5 and extreme scores (i.e. towards the bounds) are less likely than those around the mean. Both \(\alpha\) and \(\gamma\) terms use weakly informative, regularising priors that are centred on zero but allow for a wide range of values.

For the slope terms, the priors assume no effect to small effects for each parameter in either direction. These assumptions are strongly than initially planned following our pre-registration (which used very weakly informative priors) in all instances in order to improve model fit. Here, wider standard deviations for the prior on the \(\gamma\) parameter resulted in divergences during fitting. As such, this prior uses a relatively constrained standard deviation. Weakly informative regularising priors were also used for all standard deviation terms. Finally, an \(LKJ(2)\) prior was used for the correlation between terms, which acts to down-weight perfect correlations (Vasishth et al., 2018 - CITATION).

For both testing models, the following priors were used:

  • Intercept
    • \(\mu\): \(\mathcal{N}(0, 5)\)
    • \(\phi\): \(\mathcal{N}(3, 3)\)
    • \(\alpha\): \(\mathcal{logistic}(0, 1)\)
    • \(\gamma\): \(\mathcal{logistic}(0, 1)\)
  • Slope
    • \(\mu\): \(\mathcal{N}(0, 0.5)\)
    • \(\phi\): \(\mathcal{N}(0, 1)\)
    • \(\alpha\): \(\mathcal{N}(0, 5)\)
    • \(\gamma\): \(\mathcal{N}(0, 0.5)\)
  • SD
    • \(\mu\): \(\mathcal{N}(0, 1)\)
    • \(\phi\): \(\mathcal{N}(0, 1)\)
    • \(\alpha\): \(\mathcal{N}(0, 5)\)
    • \(\gamma\): \(\mathcal{N}(0, 5)\)
  • SD by Participant Number
    • \(\mu\): \(\mathcal{N}(0, 1)\)
    • \(\phi\): \(\mathcal{N}(0, 3)\)
    • \(\alpha\): \(\mathcal{N}(0, 10)\)
    • \(\gamma\): \(\mathcal{N}(0, 10)\)
  • Correlation
    • \(LKJ(2)\)

Due to having more observations for analyses during the testing phase, both the \(\mu\) and \(\phi\) intercept terms, all slope terms, the \(\phi\) SD term, and \(\phi\) SD by participant number terms use more weakly informative priors than the exposure model. This allows the data to have a larger impact on parameter estimates while having no impact on model convergence.

In all models, the approach was to use weakly informative, regularising priors for fitting. Where models failed to converge, these priors were adjusted, typically placing less prior weight on extreme values.

Model Checks

Posterior predictive checks were performed for all three models, comparing the observed posterior density against samples from the fitted model. Well fitting models show concordance between observed and sampled posterior densities. Plots for each model are displayed below. Grey lines indicate samples from the posterior, while black lines indicate the observed sample density.

As can be seen from the plots, the posterior predictive checks indicate a good model fit in all instances.

Vocabulary Test Model

A summary of the Vocabulary Test model is provided below. This can be used to determine model diagnostics and coefficients. To answer questions pertaining to our pre-registered hypotheses, and to generate plots for these summaries, we used draws from the posterior for different combinations of conditions using the tidybayes R-package (CITATION). Hypothesis tests are then provided in the form of Region of Practical Equivalence (ROPE) analyses from these draws using the bayestestR R-package.

##           Length Class       Mode     
## formula     7    brmsformula list     
## data.name   1    -none-      character
## group       1    -none-      character
## nobs        1    -none-      numeric  
## ngrps       1    -none-      list     
## autocor     0    cor_empty   list     
## prior       8    brmsprior   list     
## algorithm   1    -none-      character
## chains      1    -none-      numeric  
## iter        1    -none-      numeric  
## warmup      1    -none-      numeric  
## thin        1    -none-      numeric  
## sampler     1    -none-      character
## fixed     192    -none-      numeric  
## spec_pars   0    -none-      numeric  
## cor_pars    0    -none-      numeric  
## random      1    -none-      list

Posterior means and 80% and 90% credible intervals are provided for different conditions in the plots below. Table summaries also provide posterior means along with credible intervals at the 80%, 90%, and 95% ranges.

In all following plots and reported statistics, summaries are provided for only the overall nLEDs taking into account all distributional parameters when sampling from the posterior. Thus, we do not provide individual statistics and plots for the individual distributional terms (e.g. for zero-one inflation, or conditional-one inflation) as we did not specify any hypotheses related to these individual terms. Instead, the zero-one inflated Beta models are used purely to improve model fit and to make more accurate predictions about the overall differences in nLEDs across conditions.

Variety Exposure

Posterior means and credible intervals are provided in the table below.

Credible Interval
Variety Exposure Mean Width Interval
Variety Match 0.652 0.95 [0.608, 0.695]
Variety Mismatch 0.604 0.95 [0.559, 0.648]
Variety Mismatch Social 0.640 0.95 [0.597, 0.682]
Dialect Literacy 0.635 0.95 [0.588, 0.680]
Variety Match 0.652 0.90 [0.615, 0.688]
Variety Mismatch 0.604 0.90 [0.566, 0.641]
Variety Mismatch Social 0.640 0.90 [0.604, 0.676]
Dialect Literacy 0.635 0.90 [0.596, 0.673]
Variety Match 0.652 0.80 [0.624, 0.680]
Variety Mismatch 0.604 0.80 [0.575, 0.633]
Variety Mismatch Social 0.640 0.80 [0.612, 0.668]
Dialect Literacy 0.635 0.80 [0.604, 0.665]

We evaluated equivalence in the nLEDs between variety exposure conditions by determining a region of practical equivalence (ROPE) by which any effects between the reported bounds are deemed to be practically equivalent to 0. In all instances, this is determined at the 90% credible interval (CI) bound of the highest density interval (HDI), which is typically more stable than larger bounds (CITATION: Kruschke). We report the proportion of the HDI contained within the ROPE region along with bounds of this interval. Where HDIs are entirely contained by the equivalence bounds, equivalence is accepted. Where HDIs are entirely outside the equivalence bounds, equivalence is rejected. Uncertaintainty is assigned to any HDIs that cross the equialence bounds in either (or both) directions.

ROPE Coverage
Comparison Percentage Equivalence Interval
Variety Mismatch - Variety Match 30.794 Undecided [-0.098, 0.000]
Variety Mismatch Social - Variety Match 80.816 Undecided [-0.060, 0.038]
Dialect Literacy - Variety Match 73.711 Undecided [-0.069, 0.035]
Variety Mismatch Social - Variety Mismatch 48.083 Undecided [-0.013, 0.087]
Dialect Literacy - Variety Mismatch 55.682 Undecided [-0.021, 0.086]
Dialect Literacy - Variety Mismatch Social 80.674 Undecided [-0.056, 0.047]
Note:
ROPE range = [-0.035, 0.035]. ROPE determined at the 90% CI of the HDI.

While nLEDs in the Variety Mismatch condition are generally lower than those in the Variety Match condition, 30% of the difference scores are contained by the equivalence bounds. Similarly, while nLEDs are generally higher in the two intervention conditions (Variety Mismatch Social and Dialect Literacy) when compared to the Variety Mismatch condition, around half of the difference scores are contained by the equivalence bounds. All other differences are largely undecided.

Word Type by Variety Exposure

We also looked at whether there are any differences in performance for different word types across conditions during the vocabulary testing phase.

Posterior means and credible intervals are provided in the table below.

Credible Interval
Variety Exposure Word Type Mean Width Interval
Variety Match Non-Contrastive 0.635 0.95 [0.588, 0.681]
Variety Match Contrastive 0.669 0.95 [0.622, 0.714]
Variety Mismatch Non-Contrastive 0.591 0.95 [0.541, 0.638]
Variety Mismatch Contrastive 0.617 0.95 [0.571, 0.663]
Variety Mismatch Social Non-Contrastive 0.621 0.95 [0.573, 0.666]
Variety Mismatch Social Contrastive 0.660 0.95 [0.615, 0.702]
Dialect Literacy Non-Contrastive 0.607 0.95 [0.554, 0.658]
Dialect Literacy Contrastive 0.663 0.95 [0.618, 0.706]
Variety Match Non-Contrastive 0.635 0.90 [0.596, 0.674]
Variety Match Contrastive 0.669 0.90 [0.630, 0.707]
Variety Mismatch Non-Contrastive 0.591 0.90 [0.550, 0.631]
Variety Mismatch Contrastive 0.617 0.90 [0.579, 0.655]
Variety Mismatch Social Non-Contrastive 0.621 0.90 [0.581, 0.659]
Variety Mismatch Social Contrastive 0.660 0.90 [0.623, 0.696]
Dialect Literacy Non-Contrastive 0.607 0.90 [0.562, 0.650]
Dialect Literacy Contrastive 0.663 0.90 [0.625, 0.699]
Variety Match Non-Contrastive 0.635 0.80 [0.605, 0.666]
Variety Match Contrastive 0.669 0.80 [0.638, 0.699]
Variety Mismatch Non-Contrastive 0.591 0.80 [0.559, 0.622]
Variety Mismatch Contrastive 0.617 0.80 [0.587, 0.648]
Variety Mismatch Social Non-Contrastive 0.621 0.80 [0.590, 0.651]
Variety Mismatch Social Contrastive 0.660 0.80 [0.631, 0.688]
Dialect Literacy Non-Contrastive 0.607 0.80 [0.572, 0.640]
Dialect Literacy Contrastive 0.663 0.80 [0.634, 0.691]
ROPE Coverage
Group Comparison Percentage Equivalence Interval
Variety Match Contrastive - Non-Contrastive 18.085 Undecided [0.005, 0.061]
Variety Mismatch Contrastive - Non-Contrastive 31.714 Undecided [0.000, 0.052]
Variety Mismatch Social Contrastive - Non-Contrastive 4.691 Undecided [0.015, 0.063]
Dialect Literacy Contrastive - Non-Contrastive 0.000 Rejected [0.032, 0.080]
Note:
ROPE range = [-0.02, 0.02]. ROPE determined at the 90% CI of the HDI.

In all instances there is some evidence that performance is better for non-contrastive words relative to contrastive words. However, equivalence between the two word types is only confidently rejected for the Dialect Literacy condition. Thus, it is likely that performance was worse for contrastive words relative to non-contrastive words in the Dialect Literacy condition in the Vocabulary Test.

Testing Phase Model

A summary of the Testing Phase model is provided below. This can be used to determine model diagnostics and coefficients. As with the Vocabulary Test Model, to answer questions pertaining to our pre-registered hypotheses, and to generate plots for these summaries, we used draws from the posterior for different combinations of conditions using the tidybayes R-package (CITATION). Similarly, hypothesis tests are provided in the form of Region of Practical Equivalence (ROPE) analyses from these draws using the bayestestR R-package.

##           Length Class       Mode     
## formula     7    brmsformula list     
## data.name   1    -none-      character
## group       1    -none-      character
## nobs        1    -none-      numeric  
## ngrps       1    -none-      list     
## autocor     0    cor_empty   list     
## prior       8    brmsprior   list     
## algorithm   1    -none-      character
## chains      1    -none-      numeric  
## iter        1    -none-      numeric  
## warmup      1    -none-      numeric  
## thin        1    -none-      numeric  
## sampler     1    -none-      character
## fixed     576    -none-      numeric  
## spec_pars   0    -none-      numeric  
## cor_pars    0    -none-      numeric  
## random      1    -none-      list

Variety Exposure

Posterior means and credible intervals are provided in the table below.

ROPE Coverage
Comparison Percentage Equivalence Interval
Variety Match - Variety Mismatch 68.564 Undecided [-0.074, 0.055]
Variety Match - Variety Mismatch Social 47.738 Undecided [-0.103, 0.027]
Variety Match - Dialect Literacy 45.127 Undecided [-0.107, 0.029]
Variety Mismatch - Variety Mismatch Social 56.379 Undecided [-0.094, 0.039]
Variety Mismatch Social - Dialect Literacy 63.274 Undecided [-0.074, 0.073]
Note:
ROPE range = [-0.035, 0.035]. ROPE determined at the 90% CI of the HDI.

While the ROPE is undecided, there does not seem to be any reliable differences across the Variety Exposure conditions in regards to overall performance.

Variety Exposure for Novel Words Only

Posterior means and credible intervals are provided in the table below.

ROPE Coverage
Comparison Percentage Equivalence Interval
Variety Match - Variety Mismatch 61.428 Undecided [-0.085, 0.053]
Variety Match - Variety Mismatch Social 47.744 Undecided [-0.107, 0.029]
Variety Match - Dialect Literacy 41.497 Undecided [-0.116, 0.027]
Variety Mismatch - Variety Mismatch Social 59.910 Undecided [-0.087, 0.056]
Variety Mismatch Social - Dialect Literacy 59.793 Undecided [-0.088, 0.066]
Note:
ROPE range = [-0.035, 0.035]. ROPE determined at the 90% CI of the HDI.

A similar pattern appears for the novel words only as with all words.

Word Type by Variety Exposure

We also looked at whether there are any differences in performance for different word types across conditions during the testing phase.

There appears to be some differences in word types within groups, but are these differences reliable?

It seems that there is a clear effect of word type (for contrastive vs. non-contrastive words) for both tasks in the Dialect Literacy condition, and for reading only in the Variety Mismatch Social condition. Additionally, there is some evidence for an effect of word type in the Variety Mismatch condition, with only around 6% of the HDI in the the equivalence bounds. This suggests that performance is impaired for contrastive words only when participants are exposed to a dialect.

Posterior means and credible intervals are provided in the table below.

Credible Interval
Task Variety Exposure Word Type Mean Width Interval
Reading Variety Match Non-Contrastive 0.211 0.95 [0.164, 0.265]
Reading Variety Match Contrastive 0.220 0.95 [0.171, 0.277]
Reading Variety Match Novel 0.258 0.95 [0.201, 0.322]
Reading Variety Mismatch Non-Contrastive 0.207 0.95 [0.160, 0.261]
Reading Variety Mismatch Contrastive 0.245 0.95 [0.193, 0.305]
Reading Variety Mismatch Novel 0.271 0.95 [0.210, 0.339]
Reading Variety Mismatch Social Non-Contrastive 0.228 0.95 [0.180, 0.285]
Reading Variety Mismatch Social Contrastive 0.282 0.95 [0.226, 0.345]
Reading Variety Mismatch Social Novel 0.284 0.95 [0.218, 0.357]
Reading Dialect Literacy Non-Contrastive 0.208 0.95 [0.158, 0.266]
Reading Dialect Literacy Contrastive 0.305 0.95 [0.241, 0.374]
Reading Dialect Literacy Novel 0.292 0.95 [0.230, 0.359]
Spelling Variety Match Non-Contrastive 0.302 0.95 [0.245, 0.367]
Spelling Variety Match Contrastive 0.313 0.95 [0.255, 0.379]
Spelling Variety Match Novel 0.285 0.95 [0.227, 0.349]
Spelling Variety Mismatch Non-Contrastive 0.310 0.95 [0.251, 0.376]
Spelling Variety Mismatch Contrastive 0.304 0.95 [0.245, 0.369]
Spelling Variety Mismatch Novel 0.309 0.95 [0.247, 0.379]
Spelling Variety Mismatch Social Non-Contrastive 0.343 0.95 [0.283, 0.410]
Spelling Variety Mismatch Social Contrastive 0.346 0.95 [0.284, 0.414]
Spelling Variety Mismatch Social Novel 0.331 0.95 [0.268, 0.398]
Spelling Dialect Literacy Non-Contrastive 0.312 0.95 [0.250, 0.377]
Spelling Dialect Literacy Contrastive 0.380 0.95 [0.310, 0.451]
Spelling Dialect Literacy Novel 0.338 0.95 [0.270, 0.408]
Reading Variety Match Non-Contrastive 0.211 0.90 [0.171, 0.256]
Reading Variety Match Contrastive 0.220 0.90 [0.178, 0.266]
Reading Variety Match Novel 0.258 0.90 [0.209, 0.310]
Reading Variety Mismatch Non-Contrastive 0.207 0.90 [0.167, 0.251]
Reading Variety Mismatch Contrastive 0.245 0.90 [0.200, 0.295]
Reading Variety Mismatch Novel 0.271 0.90 [0.219, 0.327]
Reading Variety Mismatch Social Non-Contrastive 0.228 0.90 [0.186, 0.276]
Reading Variety Mismatch Social Contrastive 0.282 0.90 [0.234, 0.334]
Reading Variety Mismatch Social Novel 0.284 0.90 [0.228, 0.345]
Reading Dialect Literacy Non-Contrastive 0.208 0.90 [0.165, 0.255]
Reading Dialect Literacy Contrastive 0.305 0.90 [0.251, 0.363]
Reading Dialect Literacy Novel 0.292 0.90 [0.239, 0.348]
Spelling Variety Match Non-Contrastive 0.302 0.90 [0.253, 0.355]
Spelling Variety Match Contrastive 0.313 0.90 [0.263, 0.367]
Spelling Variety Match Novel 0.285 0.90 [0.236, 0.338]
Spelling Variety Mismatch Non-Contrastive 0.310 0.90 [0.259, 0.365]
Spelling Variety Mismatch Contrastive 0.304 0.90 [0.254, 0.358]
Spelling Variety Mismatch Novel 0.309 0.90 [0.255, 0.366]
Spelling Variety Mismatch Social Non-Contrastive 0.343 0.90 [0.291, 0.399]
Spelling Variety Mismatch Social Contrastive 0.346 0.90 [0.293, 0.402]
Spelling Variety Mismatch Social Novel 0.331 0.90 [0.277, 0.387]
Spelling Dialect Literacy Non-Contrastive 0.312 0.90 [0.260, 0.367]
Spelling Dialect Literacy Contrastive 0.380 0.90 [0.321, 0.439]
Spelling Dialect Literacy Novel 0.338 0.90 [0.281, 0.396]
Reading Variety Match Non-Contrastive 0.211 0.80 [0.179, 0.244]
Reading Variety Match Contrastive 0.220 0.80 [0.187, 0.256]
Reading Variety Match Novel 0.258 0.80 [0.219, 0.297]
Reading Variety Mismatch Non-Contrastive 0.207 0.80 [0.175, 0.240]
Reading Variety Mismatch Contrastive 0.245 0.80 [0.209, 0.283]
Reading Variety Mismatch Novel 0.271 0.80 [0.230, 0.314]
Reading Variety Mismatch Social Non-Contrastive 0.228 0.80 [0.194, 0.265]
Reading Variety Mismatch Social Contrastive 0.282 0.80 [0.243, 0.322]
Reading Variety Mismatch Social Novel 0.284 0.80 [0.239, 0.331]
Reading Dialect Literacy Non-Contrastive 0.208 0.80 [0.174, 0.244]
Reading Dialect Literacy Contrastive 0.305 0.80 [0.263, 0.350]
Reading Dialect Literacy Novel 0.292 0.80 [0.250, 0.335]
Spelling Variety Match Non-Contrastive 0.302 0.80 [0.263, 0.343]
Spelling Variety Match Contrastive 0.313 0.80 [0.274, 0.355]
Spelling Variety Match Novel 0.285 0.80 [0.246, 0.326]
Spelling Variety Mismatch Non-Contrastive 0.310 0.80 [0.270, 0.353]
Spelling Variety Mismatch Contrastive 0.304 0.80 [0.264, 0.346]
Spelling Variety Mismatch Novel 0.309 0.80 [0.266, 0.353]
Spelling Variety Mismatch Social Non-Contrastive 0.343 0.80 [0.302, 0.387]
Spelling Variety Mismatch Social Contrastive 0.346 0.80 [0.304, 0.390]
Spelling Variety Mismatch Social Novel 0.331 0.80 [0.288, 0.375]
Spelling Dialect Literacy Non-Contrastive 0.312 0.80 [0.271, 0.355]
Spelling Dialect Literacy Contrastive 0.380 0.80 [0.334, 0.426]
Spelling Dialect Literacy Novel 0.338 0.80 [0.293, 0.383]

We can also directly compare the differences in performance for contrastive words relative to non-contrastive words.

Credible Interval
Task Variety Exposure Word Type Mean Width Interval
Reading Variety Match Contrastive - Non-Contrastive 0.009 0.95 [-0.021, 0.039]
Reading Variety Mismatch Contrastive - Non-Contrastive 0.038 0.95 [0.011, 0.067]
Reading Variety Mismatch Social Contrastive - Non-Contrastive 0.053 0.95 [0.024, 0.084]
Reading Dialect Literacy Contrastive - Non-Contrastive 0.097 0.95 [0.063, 0.134]
Spelling Variety Match Contrastive - Non-Contrastive 0.011 0.95 [-0.011, 0.034]
Spelling Variety Mismatch Contrastive - Non-Contrastive -0.006 0.95 [-0.029, 0.017]
Spelling Variety Mismatch Social Contrastive - Non-Contrastive 0.002 0.95 [-0.021, 0.026]
Spelling Dialect Literacy Contrastive - Non-Contrastive 0.068 0.95 [0.044, 0.092]
Reading Variety Match Contrastive - Non-Contrastive 0.009 0.90 [-0.016, 0.034]
Reading Variety Mismatch Contrastive - Non-Contrastive 0.038 0.90 [0.015, 0.062]
Reading Variety Mismatch Social Contrastive - Non-Contrastive 0.053 0.90 [0.029, 0.079]
Reading Dialect Literacy Contrastive - Non-Contrastive 0.097 0.90 [0.068, 0.128]
Spelling Variety Match Contrastive - Non-Contrastive 0.011 0.90 [-0.008, 0.030]
Spelling Variety Mismatch Contrastive - Non-Contrastive -0.006 0.90 [-0.025, 0.013]
Spelling Variety Mismatch Social Contrastive - Non-Contrastive 0.002 0.90 [-0.017, 0.022]
Spelling Dialect Literacy Contrastive - Non-Contrastive 0.068 0.90 [0.047, 0.088]
Reading Variety Match Contrastive - Non-Contrastive 0.009 0.80 [-0.010, 0.028]
Reading Variety Mismatch Contrastive - Non-Contrastive 0.038 0.80 [0.020, 0.056]
Reading Variety Mismatch Social Contrastive - Non-Contrastive 0.053 0.80 [0.034, 0.073]
Reading Dialect Literacy Contrastive - Non-Contrastive 0.097 0.80 [0.074, 0.121]
Spelling Variety Match Contrastive - Non-Contrastive 0.011 0.80 [-0.004, 0.026]
Spelling Variety Mismatch Contrastive - Non-Contrastive -0.006 0.80 [-0.021, 0.009]
Spelling Variety Mismatch Social Contrastive - Non-Contrastive 0.002 0.80 [-0.013, 0.017]
Spelling Dialect Literacy Contrastive - Non-Contrastive 0.068 0.80 [0.052, 0.084]

These results reflect those in the plots above. Are any differences reported here reliable?

ROPE Coverage
Task Variety Exposure Comparison Percentage Equivalence Interval
Reading Variety Match Contrastive - Non-Contrastive 80.606 Undecided [-0.016, 0.033]
Reading Variety Mismatch Contrastive - Non-Contrastive 6.401 Undecided [0.014, 0.061]
Reading Variety Mismatch Social Contrastive - Non-Contrastive 0.000 Rejected [0.028, 0.078]
Reading Dialect Literacy Contrastive - Non-Contrastive 0.000 Rejected [0.067, 0.128]
Spelling Variety Match Contrastive - Non-Contrastive 80.668 Undecided [-0.007, 0.030]
Spelling Variety Mismatch Contrastive - Non-Contrastive 92.766 Undecided [-0.025, 0.014]
Spelling Variety Mismatch Social Contrastive - Non-Contrastive 97.741 Undecided [-0.016, 0.022]
Spelling Dialect Literacy Contrastive - Non-Contrastive 0.000 Rejected [0.047, 0.088]
Note:
ROPE range = [-0.02, 0.02]. ROPE determined at the 90% CI of the HDI.

Equivalence is confidently rejected for the reading task for the two intervention conditions (Variety Mismatch Social and Dialect Literacy). Additionally, for the reading task the majority of the HDI for the Variety Mismatch condition is outside of the ROPE. For the spelling task, equivalence is rejected for the Dialect Literacy condition. However, all other contrasts show that most of the HDI is contained by the ROPE (suggesting equivalence).

Finally, we ask whether or not the contrastive effect is stronger in the Variety Mismatch Social condition relative to the Variety Mismatch condition.

ROPE Coverage
Task Word Type Variety Exposure Percentage Equivalence Interval
Reading Contrastive - Non-Contrastive Variety Mismatch Social - Variety Mismatch 59.965 Undecided [-0.019, 0.050]
Spelling Contrastive - Non-Contrastive Variety Mismatch Social - Variety Mismatch 78.507 Undecided [-0.019, 0.036]
Note:
ROPE range = [-0.02, 0.02]. ROPE determined at the 90% CI of the HDI.

Any differences here are slight, and are mainly contained by the ROPE in both tasks. This suggests that there are no substantial differences in the magnitude of the effect between contrastive and non-contrastive words across these two conditions.

Exploratory Covariate Testing Model

A summary of the Testing Phase model incorporating the mean scores in the vocabulary test as a covariate is provided below. This can be used to determine model diagnostics and coefficients. As with previous models, draws from the posterior for different combinations of conditions were taken using the tidybayes R-package (CITATION). Similarly, hypothesis tests are provided in the form of Region of Practical Equivalence (ROPE) analyses from these draws using the bayestestR R-package. Extreme caution is needed for interpreting such hypothesis tests as the following is purely exploratory.

##           Length Class       Mode     
## formula      7   brmsformula list     
## data.name    1   -none-      character
## group        1   -none-      character
## nobs         1   -none-      numeric  
## ngrps        1   -none-      list     
## autocor      0   cor_empty   list     
## prior        8   brmsprior   list     
## algorithm    1   -none-      character
## chains       1   -none-      numeric  
## iter         1   -none-      numeric  
## warmup       1   -none-      numeric  
## thin         1   -none-      numeric  
## sampler      1   -none-      character
## fixed     1152   -none-      numeric  
## spec_pars    0   -none-      numeric  
## cor_pars     0   -none-      numeric  
## random       1   -none-      list

We first explored whether mean vocabulary test performance (in nLED) predicts testing performance, and whether or not this varies across Task, Variety Exposure condition, and Word Type. A plot of this relationship is shown below.

Variety Exposure for Novel Words Only

It’s quite difficult to make out an overall pattern here, so instead we performed a median split based on the vocabulary test performance, and we categorised these into those with high and low nLEDs in the vocabulary test.

We focused specifically on novel words to see whether any differences occur for novel word decoding. The following analyses summarise the patterns in the covariate model.

It’s clear from the figure that performance is generally worse in the testing phase for those with high mean nLEDs during the vocabulary test. Are there any differences if we directly compare these difference scores?

Variety Match and Mismatch seem to differ for the spelling task, while all other contrasts seem to indicate equivalent performance across conditions. Is this borne out in the data?

ROPE Coverage
Vocab Test nLED Group Task Variety Exposure Percentage Equivalence Interval
High - Low Reading Variety Mismatch - Variety Match 24.195 Undecided [-0.071, 0.135]
High - Low Reading Variety Mismatch Social - Variety Match 21.421 Undecided [-0.063, 0.154]
High - Low Reading Dialect Literacy - Variety Match 20.089 Undecided [-0.066, 0.168]
High - Low Reading Variety Mismatch Social - Variety Mismatch 26.193 Undecided [-0.107, 0.138]
High - Low Reading Dialect Literacy - Variety Mismatch 20.755 Undecided [-0.107, 0.150]
High - Low Reading Dialect Literacy - Variety Mismatch Social 23.973 Undecided [-0.135, 0.120]
High - Low Spelling Variety Mismatch - Variety Match 0.000 Rejected [0.032, 0.226]
High - Low Spelling Variety Mismatch Social - Variety Match 4.329 Undecided [0.002, 0.203]
High - Low Spelling Dialect Literacy - Variety Match 12.986 Undecided [-0.083, 0.183]
High - Low Spelling Variety Mismatch Social - Variety Mismatch 26.526 Undecided [-0.118, 0.084]
High - Low Spelling Dialect Literacy - Variety Mismatch 18.202 Undecided [-0.186, 0.076]
High - Low Spelling Dialect Literacy - Variety Mismatch Social 20.866 Undecided [-0.147, 0.116]
Note:
ROPE range = [-0.02, 0.02]. ROPE determined at the 90% CI of the HDI.

Here, there is evidence that spelling performance between those with high and low nLEDs in the vocabulary test varies between the Variety Match and Variety Mismatch conditions. Viewing the first figure, it is clear that there is a larger discrepancy in performance between those with high and low nLEDs in the vocabulary test in the Variety Mismatch condition relative to the Variety Match condition. While there is a similar trend in the other two variety conditions (i.e. Variety Mismatch Social and Dialect Literacy), there is not enough evidence here to support the claim that the differences between those with high and low nLEDs in the vocabulary test vary substantially in the Variety Mismatch Social and Dialect Literacy conditions relative to the Variety Mismatch condition.

Next, we looked at whether or not contrastive words differed to non-contrastive words across the high and low nLED groups. First off, what do the nLEDs look like in these groups?

That’s quite a lot to parse in one go, so we can instead look at the difference scores between contrastive and non-contrastive words for this same comparison.

In the Variety Mismatch condition, the difference in the reading task between the two word types seems to stem from those with low nLEDs during the vocabulary test, while the difference is consistent across groups in both of the Dialect Intervention conditions. But, any effect in spelling is only found for those with high nLEDs in the vocabulary test following the Dialect Literacy training. Does the data back this up?

ROPE Coverage
Variety Exposure Task Vocab Test nLED Group Word Type Percentage Equivalence Interval
Variety Match Reading Low Contrastive - Non-Contrastive 52.941 Undecided [-0.092, 0.034]
Variety Mismatch Reading Low Contrastive - Non-Contrastive 4.661 Undecided [0.012, 0.080]
Variety Mismatch Social Reading Low Contrastive - Non-Contrastive 9.767 Undecided [0.010, 0.062]
Dialect Literacy Reading Low Contrastive - Non-Contrastive 0.000 Rejected [0.060, 0.139]
Variety Match Spelling Low Contrastive - Non-Contrastive 70.588 Undecided [-0.022, 0.041]
Variety Mismatch Spelling Low Contrastive - Non-Contrastive 71.698 Undecided [-0.022, 0.048]
Variety Mismatch Social Spelling Low Contrastive - Non-Contrastive 71.920 Undecided [-0.042, 0.018]
Dialect Literacy Spelling Low Contrastive - Non-Contrastive 11.099 Undecided [-0.019, 0.111]
Variety Match Reading High Contrastive - Non-Contrastive 33.296 Undecided [-0.010, 0.068]
Variety Mismatch Reading High Contrastive - Non-Contrastive 49.723 Undecided [-0.021, 0.058]
Variety Mismatch Social Reading High Contrastive - Non-Contrastive 0.000 Rejected [0.026, 0.112]
Dialect Literacy Reading High Contrastive - Non-Contrastive 0.000 Rejected [0.026, 0.125]
Variety Match Spelling High Contrastive - Non-Contrastive 80.910 Undecided [-0.020, 0.034]
Variety Mismatch Spelling High Contrastive - Non-Contrastive 42.286 Undecided [-0.052, 0.011]
Variety Mismatch Social Spelling High Contrastive - Non-Contrastive 55.383 Undecided [-0.015, 0.049]
Dialect Literacy Spelling High Contrastive - Non-Contrastive 0.000 Rejected [0.035, 0.097]
Note:
ROPE range = [-0.02, 0.02]. ROPE determined at the 90% CI of the HDI.

NOTE: The posterior samples for the covariate models is relatively low when using the median split. Increasing the number of samples may give us more stable estimates.